首页> 外文OA文献 >A systematic comparison and evaluation of biclustering methods for gene expression data
【2h】

A systematic comparison and evaluation of biclustering methods for gene expression data

机译:基因表达数据的双聚类方法的系统比较和评估

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. Results: First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings. Availability: The datasets used, the outcomes of the biclustering algorithms and the Bimax implementation for the reference model are available at Contact: bleuler@tik.ee.ethz.ch Supplementary information: Supplementary data are available at
机译:动机:近年来,人们进行了各种努力来克服标准聚类方法的局限性,即通过同时对基因和样本进行分组来分析基因表达数据。潜在的概念(通常称为双聚类分析)可以识别在样品子集之间共享兼容表达模式的基因集,其有效性已在不同生物和数据集中得到证明。文献中已经提出了几种双簇方法。然而,目前尚不清楚不同技术在集群的生物学相关性以及其他特性(例如鲁棒性和对噪声的敏感性)方面如何相互比较。因此,目前尚无关于选择双簇法的指南。结果:首先,本文提供了一种比较和验证双聚类方法的方法,该方法包括一个简单的二进制参考模型。尽管此模型捕获了大多数二类聚类方法的基本特征,但它仍然足够简单,可以准确地确定所有最佳分组。为此,我们提出了一种快速分治算法(Bimax)。其次,我们在酿酒酵母和拟南芥的各种合成和真实数据集上评估了五种显着的双聚类算法以及参考模型和分层聚类方法的性能。比较结果表明:(1)一般而言,二类聚类比常规的层次聚类方法具有优势;(2)被测试的方法之间存在相当大的性能差异;(3)简单的参考模型已经在所有考虑的设置中提供了相关的模式。可用性:所使用的数据集,双聚类算法的结果以及参考模型的Bimax实现可在以下地址获得:联系人:bleuler@tik.ee.ethz.ch补充信息:补充数据可在以下地址获得:

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号